Zebrafish fusion samples /gpfs0/home1/gdkendalllab/lab/raw_data/fastq/2016_01_01/ sample tumor fusion background A667T19 D738 VGLL2-NCOA2 P53-/- A667T20 D739 VGLL2-NCOA2 AB/TL A667T21 D742 VGLL2-NCOA2 P53-/- A667T22 D777 VGLL2-NCOA2 AB/TL A667T23 D799 VGLL2-NCOA2 AB/TL A667T24 D800 VGLL2-NCOA2 AB/TL A667T25 D801 VGLL2-NCOA2 AB/TL A667T26 D807 VGLL2-NCOA2 AB/TL A667T27 D808 VGLL2-NCOA2 AB/TL A667T28 D813-Head VGLL2-NCOA2 AB/TL A667T29 D813-Tail VGLL2-NCOA2 AB/TL D809 D809 VGLL2-NCOA2 AB/TL D811 D811 VGLL2-NCOA2 AB/TL D812-head D812-head VGLL2-NCOA2 AB/TL D812-tail D812-tail VGLL2-NCOA2 AB/TL D814 D814 VGLL2-NCOA2 AB/TL D815 D815 VGLL2-NCOA2 AB/TL D874 D874 VGLL2-NCOA2 AB/TL
Zebrafish P53/Kras mutant fish SRR6507311 mutant SRR6507312 mutant SRR6507313 mutant SRR6507300 whole fish SRR6507301 whole fish SRR6507302 whole fish
Zebrafish normal muscle /home/gdkendalllab/lab/raw_data/fastq/2016_01_02 A374-A375T23 A374-A375T24 D948 muscle D949 muscle D950 muscle D951 muscle D952 muscle
Zebrafish CICDUX4 tumors D655 D668 D669 D678 D681 D682 D684 D685 D686 D687 D688 D689
Mouse data from fusion in C2C12 cells /gpfs0/home1/gdkendalllab/lab/raw_data/fastq/2015_12_22/ A377-A378T1 - A377-A378T8 T7 and T8 are controls with C2C12 without fusion expression
Mouse normal skeletal muscle from SRA PRJNA625451 PRJNA608179 - Miiice in spaaaaace PRJNA819493 - 14, 36 weeks PRJNA813153
Human data from patient fusion tumors /gpfs0/home1/gdkendalllab/lab/raw_data/fastq/2016_12_14/ EGAR00001508614_SARC061 EGAR00001508618_SARC065 EGAR00001508623_SARC070-Primary EGAR00001508624_SARC070-Relapse1 EGAR00001508656_SARC102
Normal human skeletal muscle data from GTEx https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/gene_reads/gene_reads_2017-06-05_v8_muscle_skeletal.gct.gz
Figure 2 of paper - moving to supplement and replace with comparison vs adult skeletal muscle ***
Allograft Figure 6 - Likely end up in supplemental data scatterplot of C2C12-VN2 vs adult muscle (maybe C2C12 control) expression vs VN2 human tumors vs adult muscle pathways co-enriched in both comparisons *** Kras vs our tumors PCA - include RNAseq despite P53 include adult muscle to check for batch effect heatmap scatterplot of Kras vs our tumors expression Fold change values do GSEA or pathway enrichment analysis on shared DE genes and uniquely DE genes
Level of fusion gene expression between mouse, zebrafish and human
do PCA/umap of Dr tumor samples
To analyze these samples, we aligned with HiSat2 and counted the number of reads mapping to each gene using featureCounts. We used EdgeR to test for differential expression between the groups. For most statistical analyses and plotting, we used R and ggplot2.
Based on group names from sampleMetadata.xlsx
DrFusion vs DrSkMu DrKras vs DrKrasCtrl MmFusion vs MmSkMu HsFusion vs HsSkMu
AGDEX analysis
VGLL2NCOA2 human and fish tumors vs. mature skeletal muscle
VGLL2NCOA2 human and fish tumors vs. KRAS driven RMS
Mouse VGLL2NCOA2 and fish tumors vs. mature skeletal muscle
Motif analysis (done)
VGLL2-NCOA2 expression in the mouse xenograft models
Determining reads that span the fusion breakpoint/levels of expression in the VN2 mouse allograft models.
27 developmental gene signature
Are these genes expressed in mouse VGLL2-NCOA2 tumors compared to C2C12 controls?
Are these genes expressed in patient VGLL2-NCOA2 tumors compared to mature skeletal muscle?
Chip-Seq analysis from zebrafish tumors and C2C12 (as data is generated)
Tead motifs upstream of 27 developmental genes
iRDB from St. Jude Arf6 expression across tumors expression vs gene effect scatterplot
library(tidyverse)
theme_set(theme_bw())
theme_update(plot.title = element_text(hjust = 0.5))
options(bitmaptype = "cairo")
library(gt)
library(edgeR)
plot_tabs <- function(plot_list_name, level = "###") {
knitr_chunk_list <-
lapply(seq_along(get(plot_list_name)), function(i) {
tab_name <- names(get(plot_list_name))[i]
a1 <- knitr::knit_expand(text = sprintf(paste0("\n",
level,
" %s\n"),
tab_name))
a2 <- knitr::knit_expand(text = sprintf("\n```{r %s, message=FALSE, warning=FALSE}",
paste0(tab_name,
"_",
i)))
a3 <- knitr::knit_expand(text = sprintf(paste0("\n",
plot_list_name,
"[['%s']]"),
tab_name))
a4 <- knitr::knit_expand(text = "\n```\n\n")
paste(a1, a2, a3, a4, collapse = "\n")
})
paste(knitr::knit(text = paste(knitr_chunk_list, collapse = "\n")))
}
for directoryName in \
misc \
slurmOut \
sbatchCmds \
input \
output/fastqc \
output/figures \
output/kallisto \
output/fusionAligned \
output/aligned \
output/aligned/mmSkMu/ \
output/aligned/mmVGLL/ \
output/aligned/Hs \
output/aligned/drKras \
output/aligned/drVGLL \
output/de \
output/geneCounts \
output/trimmed
do
if [ ! -d ${directoryName} ]
then
mkdir -p ${directoryName}
fi
done
#!/bin/sh
#SBATCH --account=gdkendalllab
#SBATCH --error=slurmOut/fastqc-%j.txt
#SBATCH --output=slurmOut/fastqc-%j.txt
#SBATCH --mem=2G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=5
#SBATCH --job-name fastqc
#SBATCH --wait
#SBATCH --array=0-15
#SBATCH --time=1-00:00:00
#SBATCH --mail-user=matthew.cannon@nationwidechildrens.org
#SBATCH --mail-type=FAIL,REQUEUE,TIME_LIMIT_80
set -e ### stops bash script if line ends with error
echo ${HOSTNAME} ${SLURM_ARRAY_TASK_ID}
module purge
module load FastQC/0.11.9-Java-11.0.2
inputPath=/home/gdkendalllab/lab/raw_data/fastq/2015_12_22
fileArray=(${inputPath}/*fastq.gz)
fastqc \
-o output/fastqc \
-t 5 \
--extract \
${fileArray[${SLURM_ARRAY_TASK_ID}]}
#!/usr/bin/perl
use warnings;
use strict;
use Getopt::Long;
use Pod::Usage;
##############################
# By Matt Cannon
# Date: 6-22-21
# Last modified: 6-22-21
# Title: splitFastqcData.pl
# Purpose: Split fastqc_data.txt into sub-files based on modules
##############################
##############################
# Options
##############################
my $verbose;
my $help;
my $input;
my $outputDir = "";
# i = integer, s = string
GetOptions ("verbose" => \$verbose,
"help" => \$help,
"input=s" => \$input,
"outputDir=s" => \$outputDir
) or pod2usage(0) && exit;
pod2usage(1) && exit if ($help);
##############################
# Global variables
##############################
my $outputFH;
my $baseColumn;
##############################
# Code
##############################
if ($outputDir eq "") {
$input =~ /^(.+\/)/;
$outputDir = $1;
mkdir $outputDir . "/split_data"
}
##############################
### Stuff
### More stuff
open my $inputFH, "$input" or die "Could not open first input\n$!";
while (my $input = <$inputFH>){
chomp $input;
if ($input =~ /^>>END_MODULE/) {
# Close file
close $outputFH;
undef $baseColumn;
} elsif ($input =~ /^>>/) {
# Open new file
$input =~ s/^>>//;
$input =~ s/ /_/g;
$input =~ s/\t.+//;
open $outputFH, ">", $outputDir . "/split_data/" . $input . ".txt";
} elsif ($input !~ /^##/) {
# Deal with data
## Find column that has base numbering
if ($input =~ /Base|Position/ && $input =~ /^#/) {
# Change the word Position to Base to make downstream plotting easier
$input =~ s/Position/Base/;
my @dataArray = split "\t", $input;
for (my $i = 0; $i < scalar(@dataArray); $i++) {
if ($dataArray[$i] =~ /^#*Base$/) {
$baseColumn = $i;
}
}
}
# write to file
if (defined($baseColumn)) {
my @dataArray = split "\t", $input;
if ($dataArray[$baseColumn] =~ /-/) {
my ($lowerNum, $upperNum) = split "-", $dataArray[$baseColumn];
for (my $i = $lowerNum; $i <= $upperNum; $i++) {
my @tempArray = @dataArray;
$tempArray[$baseColumn] = $i;
print $outputFH join("\t", @tempArray), "\n";
}
} else {
print $outputFH $input, "\n";
}
} else {
print $outputFH $input, "\n";
}
}
}
##############################
# POD
##############################
#=pod
=head SYNOPSIS
Summary:
xxxxxx.pl - generates a consensus for a specified gene in a specified taxa
Usage:
perl xxxxxx.pl [options]
=head OPTIONS
Options:
--verbose
--help
=cut
sbatch sbatchCmds/fastqc.sh
rm output/fastqc/*.zip
for inFile in output/fastqc/*/fastqc_data.txt
do
perl ~/scripts/splitFastqcData.pl -i ${inFile}
done
adapter_file_dirs <- list.dirs(
path = "output/fastqc",
full.names = TRUE,
recursive = FALSE
)
to_plot <- tibble(
file = c(
"Adapter_Content.txt",
"Per_base_sequence_quality.txt"
),
column = c(
"Nextera_Transposase_Sequence",
"Mean"
)
)
for (i in 1:nrow(to_plot)) {
temp_data <- NULL
for (data_dir in adapter_file_dirs) {
temp_data <- read_delim(paste(data_dir,
"/split_data/",
to_plot$file[i],
sep = ""),
delim = "\t",
col_names = TRUE,
show_col_types = FALSE) %>%
rename_all(~ str_replace_all(., " ", "_")) %>%
rename_all(~ str_remove(., "[#']")) %>%
select(Base, to_plot$column[i]) %>%
mutate(Sample = str_remove(data_dir, "_001_fastqc") %>%
str_remove(".+\\/")) %>%
mutate(Read = str_remove(Sample, ".+_")) %>%
mutate(Tech = str_remove(Sample, "_.+")) %>%
bind_rows(temp_data)
}
max_y <- ceiling(max(temp_data[[to_plot$column[i]]]) * 1.1)
plot_fastqc <- ggplot(temp_data,
aes(x = Base,
y = get(to_plot$column[i]),
group = Sample,
color = Sample)) +
geom_line() +
geom_point() +
facet_wrap(~Tech, ncol = 1) +
ylim(0, max_y) +
ylab(to_plot$column[i]) +
theme(legend.position = "none") +
ggtitle(str_remove(to_plot$file[i], ".txt"))
png(
filename = paste("output/figures/fastqc_",
str_remove(to_plot$file[i], ".txt"),
".png",
sep = ""),
width = 7000,
height = 3000,
res = 300)
print(plot_fastqc)
dev.off()
}
input_reads <- read_tsv(list.files(path = "output/fastqc",
recursive = TRUE,
pattern = "Basic_Statistics.txt",
full.names = TRUE),
id = "File",
show_col_types = FALSE) %>%
filter(`#Measure` == "Total Sequences") %>%
select(-`#Measure`) %>%
mutate(
File = str_remove(File, "output/fastqc/") %>%
str_remove("_001_fastqc/split_data/Basic_Statistics.txt"),
Sample = str_remove(File, "_S[0-9].+"),
Lane = str_remove(File, ".+_L00") %>%
str_remove("_R[12]"),
Read = str_replace(File, ".+_R", "R"),
Value = as.numeric(Value)) %>%
rename(Reads = Value)
ggplot(input_reads, aes(x = Sample, y = Reads / 1000000, fill = Lane)) +
geom_col() +
facet_wrap(~Read, ncol = 1) +
ylab("Reads (in millions)") +
xlab("") +
ggtitle("Sequenced Reads Per Sample") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggsave("output/figures/InputReads.png",
height = 3000,
width = 4000,
units = "px"
)
https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/gene_reads/gene_reads_2017-06-05_v8_muscle_skeletal.gct.gz
Edited the file by hand to get rid of extraneous header info
sbatch sbatchCmd/genStarRef.sh
{bash starRef, eval=FALSE sbatch ref/starhg38.p4/genStarRef.sh
sbatch ref/fusionRef/genStarFusionRef.sh
#!/bin/sh
#SBATCH --account=gdkendalllab
#SBATCH --array=0-7
#SBATCH --error=slurmOut/alignMmVGLL-%j.txt
#SBATCH --output=slurmOut/alignMmVGLL-%j.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10
#SBATCH --job-name alignMmVGLL
#SBATCH --wait
#SBATCH --mail-user=matthew.cannon@nationwidechildrens.org
#SBATCH --mail-type=FAIL,REQUEUE,TIME_LIMIT_80
#SBATCH --time=2-00:00:00
set -e ### stops bash script if line ends with error
echo ${HOSTNAME} ${SLURM_ARRAY_TASK_ID}
ml purge
ml load GCCcore/8.3.0 \
Trim_Galore/0.6.5-Java-11.0.2-Python-3.7.4
inputPath=/home/gdkendalllab/lab/raw_data/fastq/2015_12_22
tempPath=/gpfs0/scratch/mvc002/kendall
fileArray=(${inputPath}/*R1.fastq.gz)
R1=${fileArray[${SLURM_ARRAY_TASK_ID}]}
R2=${R1/R1.fastq/R2.fastq}
baseName=${R1%.R1.fastq.gz}
baseName=${baseName##*/}
trim_galore \
--length 30 \
-o ${tempPath} \
-j 10 \
--paired \
${R1} \
${R2}
# Hard trim to 50bp to keep consistent across samples
trimLen=150
perl ~/oldCode/oldScripts/trimFastq.pl \
--max ${trimLen} \
--fastq ${tempPath}/${baseName}.R1_val_1.fq.gz |
pigz \
> ${tempPath}/${baseName}_1.fastq.gz
perl ~/oldCode/oldScripts/trimFastq.pl \
--max ${trimLen} \
--fastq ${tempPath}/${baseName}.R2_val_2.fq.gz |
pigz \
> ${tempPath}/${baseName}_2.fastq.gz
ml purge
ml load GCC/7.3.0-2.30 \
OpenMPI/3.1.1 \
SAMtools/1.9
STAR \
--runMode alignReads \
--outSAMtype BAM SortedByCoordinate \
--runThreadN 10 \
--outFilterMultimapNmax 1 \
--readFilesCommand zcat \
--genomeDir ref/starmm10 \
--readFilesIn ${tempPath}/${baseName}_1.fastq.gz \
${tempPath}/${baseName}_2.fastq.gz \
--outFileNamePrefix output/aligned/mmVGLL/${baseName##*/}
samtools index output/aligned/mmVGLL/${baseName##*/}Aligned.sortedByCoord.out.bam
rm ${tempPath}/${baseName}.R[12]_val_[12].fq.gz \
${tempPath}/${baseName}.R[12].fastq.gz_trimming_report.txt \
${tempPath}/${baseName}_[12].fastq.gz
STAR --version
sbatch sbatchCmds/alignMmVGLL.sh
sbatch sbatchCmds/alignMmSkMuscle.sh
sbatch sbatchCmds/alignDrKras.sh
sbatch sbatchCmds/alignDrVGLL.sh
sbatch sbatch/alignHs.sh
sbatch sbatchCmds/alignFusion.sh
#!/bin/sh
#SBATCH --account=gdkendalllab
#SBATCH --error=slurmOut/countGenesMm-%j.txt
#SBATCH --output=slurmOut/countGenesMm-%j.txt
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --job-name countGenesMm
#SBATCH --wait
#SBATCH --array=0-27
#SBATCH --mail-user=matthew.cannon@nationwidechildrens.org
#SBATCH --mail-type=FAIL,REQUEUE,TIME_LIMIT_80
#SBATCH --time=1-24:00:00
set -e ### stops bash script if line ends with error
echo ${HOSTNAME} ${SLURM_ARRAY_TASK_ID}
fileArray=(output/aligned/mmVGLL/*.bam
output/aligned/mmSkMu/*Aligned.sortedByCoord.out.bam)
inFile=${fileArray[${SLURM_ARRAY_TASK_ID}]}
baseName=${inFile##*/}
baseName=${baseName%Aligned.sortedByCoord.out.bam}
ml purge
ml load HTSeq/0.12.4
htseq-count \
-n 4 \
-t exon \
-c output/geneCounts/geneCountsMm_${baseName}.txt \
-r pos \
-s no \
${inFile} \
/reference/mus_musculus/mm10/ucsc_assmebly/illumina_download/Annotation/Genes/genes.gtf
sbatch sbatchCmds/countMmGenes.sh
sbatch sbatchCmds/countHsGenes.sh
sbatch sbatchCmds/countDrGenes.sh
grep_counts <- function(query, name) {
system(paste0("grep '", query, "' output/aligned/*/*Log.final.out"),
intern = TRUE) %>%
as_tibble() %>%
separate(col = value,
into = c("sample", name),
sep = "\t") %>%
mutate(sample = str_remove(sample, "output/aligned/") %>%
str_remove("Log.final.out.+"))
}
aligned_counts <-
grep_counts("Number of input reads", "input_reads") %>%
full_join(grep_counts("Uniquely mapped reads number", "mapped_reads"),
by = "sample") %>%
full_join(grep_counts("Uniquely mapped reads %", "perc_mapped"),
by = "sample") %>%
separate(col = sample,
into = c("project", "sample"),
sep = "/")
for species_abr in c("Mm", "Hs", "Dr") {
tibble(assigned_reads =
read_tsv(paste0("output/geneCounts/geneCounts", species_abr, ".txt"),
col_names = FALSE,
show_col_types = FALSE) %>%
select(-X1) %>%
colSums(),
sample = system(""))
}
read_tsv(list.files(path = "output/aligned/counts",
pattern = "_counts.txt",
full.names = TRUE),
id = "Sample",
col_names = "aligned_reads",
show_col_types = FALSE) %>%
mutate(Sample = str_remove(Sample, "output/aligned/counts/") %>%
str_remove("_counts.txt"))
ggplot(aligned_counts, aes(x = Sample, y = aligned_reads / 1000000)) +
geom_col() +
labs(x = "",
y = "Reads Aligned (in millions)",
title = "Number of Reads Aligned to the Genome") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggsave("output/figures/CountAligned.png",
height = 3000,
width = 4000,
units = "px")
Remember to kick out the last few rows which have summary numbers Columns are arranged in alphabetic order by sample, double checked this by calculating colsums on each and comparing to output/aligned/*.final.out
Ortholog sources:
http://www.informatics.jax.org/downloads/reports/HOM_AllOrganism.rpt https://www.genenames.org/tools/hcop/ http://www.ensembl.org/biomart/martview/ https://www.alliancegenome.org/downloads
for (species in c("Dr", "Mm", "Hs")) {
gene_counts <- read_tsv(list.files(path = "output/geneCounts",
pattern = paste0("geneCounts",
species,
".+.txt$"),
full.names = TRUE),
id = "sample",
col_names = c("gene", "count"),
show_col_types = FALSE) %>%
mutate(sample = str_remove(sample,
paste0("output/geneCounts/geneCounts",
species,
"_")) %>%
str_remove(".txt")) %>%
filter(grepl("^__", gene) == FALSE) %>%
pivot_wider(names_from = "sample",
values_from = "count",
values_fill = 0)
# keep only genes in the counts files for human
if (species == "Hs") {
gene_counts <-
inner_join(gene_counts,
read_tsv("input/gene_reads_2017-06-05_v8_muscle_skeletal.gct.gz",
show_col_types = FALSE) %>%
select(-id, -Name) %>%
group_by(Description) %>%
summarize(across(where(is.numeric), sum)) %>%
rename(gene = Description)) %>%
mutate(across(everything(), ~replace_na(.x, 0)))
}
# Keep only genes that are in the tumor samples since some of the samples
# were prepped with a different library prep method
if (species == "Mm") {
sk_mu <-
gene_counts %>%
select(c(gene, starts_with("SRR"))) %>%
rowwise() %>%
mutate(row_sum = sum(c_across(where(is.numeric)))) %>%
ungroup() %>%
filter(row_sum > 0) %>%
select(-row_sum)
tumor <-
gene_counts %>%
select(c(gene, !starts_with("SRR"))) %>%
rowwise() %>%
mutate(row_sum = sum(c_across(where(is.numeric)))) %>%
ungroup() %>%
filter(row_sum > 0) %>%
select(-row_sum)
gene_counts <- inner_join(tumor, sk_mu, by = "gene")
}
write_tsv(gene_counts, paste0("output/geneCounts/geneCountsCombined",
species,
".txt.gz"))
}
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE109nnn/GSE109581/suppl/GSE109581%5FcountsMatrix%2Etxt%2Egz
VGLL2NCOA2 tumors KRAS GSE39731; GSE32425), CICDUX4 (us), MPNST (GSE11493), skeletal muscle (us); and MPNST/ERMS/leukemia/others from fish (GSE109581)
metadata <- read_tsv("misc/sampleMetadata.txt",
show_col_types = FALSE)
for (species in c("Dr", "Mm", "Hs")) {
gene_counts <- read_tsv(paste0("output/geneCounts/geneCountsCombined",
species,
".txt.gz"),
show_col_types = FALSE) %>%
column_to_rownames("gene")
# Add GTEX data to the metadata
if (species == "Hs") {
metadata <- metadata %>%
filter(!grepl("GTEX", sample)) %>%
bind_rows(tibble(sample = colnames(gene_counts)[grep("GTEX",
colnames(gene_counts))],
group = "HsSkMu",
bioproject = "GTEX"))
}
groups <- metadata[metadata %>%
pull(sample) %>%
match(colnames(gene_counts), .), ] %>%
pull(group)
data_dge <- DGEList(counts = gene_counts, group = groups)
kept_df <- filterByExpr(data_dge, min.count = 0.1)
data_dge <- data_dge[kept_df, keep.lib.sizes = FALSE] %>%
calcNormFactors(.) %>%
estimateDisp(., model.matrix(~ groups))
dim(data_dge$counts)
norm_counts <-
cpm(data_dge) %>%
as.data.frame() %>%
t()
pca_out <- prcomp(norm_counts)
pca_out_merged <-
left_join(as.data.frame(pca_out$x) %>%
rownames_to_column("sample"),
metadata)
variance <- (pca_out$sdev)^2 / sum(pca_out$sdev^2) * 100
for (color_group in c("group", "bioproject")) {
ggplot(pca_out_merged,
aes(x = PC1,
y = PC2,
color = get(color_group),
label = sample)) +
geom_point(size = 2.5, color = "black") +
geom_point(size = 2) +
ggrepel::geom_text_repel(size = 2, max.overlaps = 20) +
scale_color_brewer(palette = "Set2",
name = stringr::str_to_title(color_group)) +
xlab(paste0("PC1 (",
round(variance[1], 1),
"%)")) +
ylab(paste0("PC2 (",
round(variance[2], 1),
"%)")) +
ggtitle(paste("PCA on", species))
ggsave(paste0("output/figures/PCA_",
species,
"_",
color_group,
".png"),
height = 10,
width = 13)
pca_out_merged %>%
select({{ color_group }}, paste0("PC", 1:9)) %>%
pivot_longer(cols = -{{ color_group }},
names_to = "pc",
values_to = "pc_value") %>%
group_by(pc) %>%
mutate(scaled_pc_value = scale(pc_value)) %>%
ggplot(aes(x = pc,
y = scaled_pc_value,
color = get(color_group))) +
geom_boxplot(outlier.shape = NA) +
geom_point(position = position_jitterdodge(jitter.width = 0.05),
size = 0.5) +
scale_color_brewer(palette = "Set2",
name = stringr::str_to_title(color_group)) +
ggtitle("Separation of Groups by PCs")
ggsave(paste0("output/figures/PCA_",
species,
"_",
color_group,
"_boxplot.png"),
height = 8,
width = 13)
}
}
## Joining, by = "sample"
## Warning: ggrepel: 22 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: Removed 43 rows containing missing values (geom_point).
## Warning: ggrepel: 22 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: Removed 387 rows containing missing values (geom_point).
## Joining, by = "sample"
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 72 rows containing missing values (geom_point).
## Joining, by = "sample"
## Warning: ggrepel: 556 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: ggrepel: 554 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Warning: Removed 45 rows containing missing values (geom_point).
metadata <- read_tsv("misc/sampleMetadata.txt",
show_col_types = FALSE)
gene_counts <- read_tsv(paste0("output/geneCounts/geneCountsCombinedDr.txt.gz"),
show_col_types = FALSE) %>%
column_to_rownames("gene") %>%
select(-matches("Muscle"),
-`A374-A375T23`,
-`A374-A375T24`)
groups <- metadata[metadata %>%
pull(sample) %>%
match(colnames(gene_counts), .), ] %>%
pull(group)
data_dge <- DGEList(counts = gene_counts, group = groups)
kept_df <- filterByExpr(data_dge, min.count = 0.1)
data_dge <- data_dge[kept_df, keep.lib.sizes = FALSE] %>%
calcNormFactors(.) %>%
estimateDisp(., model.matrix(~ groups))
dim(data_dge$counts)
## [1] 26971 36
norm_counts <-
cpm(data_dge) %>%
as.data.frame() %>%
t()
### Heatmaps
png("output/figures/heatmap_zebrafish_tumors.png",
width = 2000,
height = 2000,
res = 300)
pheatmap::pheatmap(t(norm_counts),
color = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7,
name = "PuOr")))(100),
show_rownames = FALSE,
scale = "row",
fontsize = 8,
annotation_names_col = FALSE,
annotation_col = data.frame(Group = groups,
row.names = colnames(gene_counts)))
dev.off()
## png
## 3
### PCA
pca_out <- prcomp(norm_counts)
pca_out_merged <-
left_join(as.data.frame(pca_out$x) %>%
rownames_to_column("sample"),
metadata)
## Joining, by = "sample"
variance <- (pca_out$sdev)^2 / sum(pca_out$sdev^2) * 100
ggplot(pca_out_merged,
aes(x = PC1,
y = PC2,
color = group,
label = sample)) +
geom_point(size = 2.5, color = "black") +
geom_point(size = 2) +
ggrepel::geom_text_repel(size = 2, max.overlaps = 20) +
scale_color_brewer(palette = "Set2",
name = "Group") +
xlab(paste0("PC1 (",
round(variance[1], 1),
"%)")) +
ylab(paste0("PC2 (",
round(variance[2], 1),
"%)")) +
ggtitle("PCA on Zebrafish Tumors")
## Warning: ggrepel: 18 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ggsave(paste0("output/figures/PCA_Dr_tumors.png"),
height = 10,
width = 13)
## Warning: ggrepel: 13 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
pca_out_merged %>%
select(group, paste0("PC", 1:9)) %>%
pivot_longer(cols = -group,
names_to = "pc",
values_to = "pc_value") %>%
group_by(pc) %>%
mutate(scaled_pc_value = scale(pc_value)) %>%
ggplot(aes(x = pc,
y = scaled_pc_value,
color = group)) +
geom_boxplot(outlier.shape = NA) +
geom_point(position = position_jitterdodge(jitter.width = 0.05),
size = 0.5) +
scale_color_brewer(palette = "Set2",
name = stringr::str_to_title(color_group)) +
ggtitle("Separation of Groups by PCs")
ggsave(paste0("output/figures/PCA_Dr_boxplot.png"),
height = 8,
width = 13)
set.seed(1337)
umap_out <- umap::umap(norm_counts, n_neighbors = 4)$layout %>%
as.data.frame() %>%
rownames_to_column("sample") %>%
as_tibble() %>%
left_join(metadata)
## Joining, by = "sample"
centroids <- umap_out %>%
group_by(group) %>%
summarize(V1 = mean(V1),
V2 = mean(V2),
.groups = "drop")
ggplot(umap_out,
aes(x = V1,
y = V2,
color = group)) +
geom_point(size = 4) +
ggrepel::geom_text_repel(data = centroids,
aes(x = V1,
y = V2,
label = group),
color = "black",
force = 50) +
scale_color_brewer(palette = "Paired",
name = "Group") +
ggrepel::geom_text_repel(aes(label = sample))
ggsave(paste0("output/figures/umap_Dr_tumors_groups.png"),
height = 8,
width = 15)
merged_data <- tibble(Dr = character())
species_full <- list(Hs = "Homo sapiens",
Dr = "Danio rerio",
Mm = "Mus musculus")
for (species in c("Dr", "Mm", "Hs")) {
gene_counts <- read_tsv(paste0("output/geneCounts/geneCountsCombined",
species,
".txt.gz"),
show_col_types = FALSE) %>%
rename({{ species }} := gene)
if (species != "Dr") {
gene_counts <- right_join(read_tsv("../refs/ORTHOLOGY-ALLIANCE_COMBINED.tsv.gz",
show_col_types = FALSE,
comment = "#") %>%
dplyr::filter(Gene1SpeciesName == species_full[["Dr"]] &
Gene2SpeciesName == species_full[[species]]) %>%
select(Gene1Symbol, Gene2Symbol) %>%
rename(Dr = Gene1Symbol,
{{ species }} := Gene2Symbol) %>%
distinct(),
gene_counts)
} else {
gene_counts <- right_join(read_tsv("/home/gdkendalllab/lab/references/annotations/zebrafish_gene_info.txt.gz",
show_col_types = FALSE,
comment = "#") %>%
select(Gene_name, Gene_stable_ID_version) %>%
rename(Dr = Gene_stable_ID_version) %>%
distinct(),
gene_counts) %>%
select(-Dr) %>%
rename(Dr = Gene_name) %>%
filter(!is.na(Dr)) %>%
group_by(Dr) %>%
summarize(across(where(is.numeric), sum), .groups = "drop")
}
merged_data <- full_join(merged_data, gene_counts) %>%
na.omit()
}
merged_data <-
merged_data %>%
mutate(gene = paste(Dr, Mm, Hs, sep = "/"), .before = 1) %>%
select(-Dr, -Mm, -Hs) %>%
column_to_rownames("gene")
metadata <-
read_tsv("misc/sampleMetadata.txt",
show_col_types = FALSE) %>%
select(sample, group) %>%
filter(!grepl("GTEX", sample)) %>%
bind_rows(tibble(sample = colnames(gene_counts)[grep("GTEX",
colnames(gene_counts))],
group = "HsSkMu",
bioproject = "GTEX")) %>%
arrange(match(sample, colnames(merged_data))) %>%
mutate(species = substr(group, 1, 2))
groups <- metadata$group
data_dge <- DGEList(counts = merged_data, group = groups)
kept_df <- filterByExpr(data_dge, min.count = 0.1)
data_dge <- data_dge[kept_df, keep.lib.sizes = FALSE] %>%
calcNormFactors(.) %>%
estimateDisp(., model.matrix(~ groups))
dim(data_dge$counts)
norm_counts <-
cpm(data_dge) %>%
as.data.frame() %>%
t()
### Heatmaps
set.seed(1337)
non_gtex_samples <- grep("GTEX", rownames(norm_counts), invert = TRUE)
gtex_samples_20 <- grep("GTEX", rownames(norm_counts)) %>%
sample(size = 20)
sub_gtex <- norm_counts[c(non_gtex_samples, gtex_samples_20),]
annot_df <- data.frame(Group = groups[c(non_gtex_samples, gtex_samples_20)],
row.names = colnames(merged_data)[c(non_gtex_samples, gtex_samples_20)]) %>%
mutate(Species = substr(Group, 1, 2))
png("output/figures/heatmap_all.png",
width = 3500,
height = 2000,
res = 300)
pheatmap::pheatmap(t(sub_gtex),
color = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7,
name = "PuOr")))(100),
show_rownames = FALSE,
scale = "row",
fontsize = 8,
annotation_names_col = FALSE,
annotation_col = annot_df)
dev.off()
### PCA
pca_out <- prcomp(norm_counts)
pca_out_merged <-
left_join(as.data.frame(pca_out$x) %>%
rownames_to_column("sample"),
metadata)
variance <- (pca_out$sdev)^2 / sum(pca_out$sdev^2) * 100
ggplot(pca_out_merged,
aes(x = PC1,
y = PC2,
color = group)) +
geom_point(size = 2.5, color = "black") +
geom_point(size = 2) +
scale_color_brewer(palette = "Paired",
name = "Group") +
xlab(paste0("PC1 (",
round(variance[1], 1),
"%)")) +
ylab(paste0("PC2 (",
round(variance[2], 1),
"%)")) +
ggtitle(paste("PCA on", species))
ggsave(paste0("output/figures/PCA_all_groups.png"),
height = 10,
width = 13)
pca_out_merged %>%
select(group, paste0("PC", 1:9)) %>%
pivot_longer(cols = -group,
names_to = "pc",
values_to = "pc_value") %>%
group_by(pc) %>%
mutate(scaled_pc_value = scale(pc_value)) %>%
ggplot(aes(x = pc,
y = scaled_pc_value,
color = group)) +
geom_boxplot(outlier.shape = NA) +
geom_point(position = position_jitterdodge(jitter.width = 0.05),
size = 0.2) +
scale_color_brewer(palette = "Paired",
name = "Group") +
ggtitle("Separation of Groups by PCs")
ggsave(paste0("output/figures/PCA_all_boxplot.png"),
height = 8,
width = 13)
set.seed(1337)
umap_out <- umap::umap(norm_counts)$layout %>%
as.data.frame() %>%
rownames_to_column("sample") %>%
as_tibble() %>%
full_join(metadata)
centroids <- umap_out %>%
group_by(group) %>%
summarize(V1 = mean(V1),
V2 = mean(V2),
.groups = "drop")
ggplot(umap_out,
aes(x = V1,
y = V2,
color = group)) +
geom_point() +
ggrepel::geom_text_repel(data = centroids,
aes(x = V1,
y = V2,
label = group),
color = "black",
force = 50) +
scale_color_brewer(palette = "Paired",
name = "Group")
ggsave(paste0("output/figures/umap_all_groups.png"),
height = 8,
width = 15)
metadata <- read_tsv("misc/sampleMetadata.txt",
show_col_types = FALSE)
de_cmp_table <- tibble(group_1 = c("DrFusion", "DrKras", "MmFusion", "HsFusion"),
group_2 = c("DrSkMu", "DrKrasCtrl", "MmSkMu", "HsSkMu"))
logFC_plot_list <- list()
volcano_plot_list <- list()
scatterplot_list <- list()
for (i in seq_len(nrow(de_cmp_table))) {
## EdgeR portion
species <- substr(de_cmp_table[i, "group_1"], 1, 2)
count_data <- read_tsv(paste0("output/geneCounts/geneCountsCombined",
species,
".txt.gz"),
show_col_types = FALSE)
if (species == "Hs") {
metadata <- metadata %>%
filter(!grepl("GTEX", sample)) %>%
bind_rows(tibble(sample = colnames(count_data)[grep("GTEX",
colnames(count_data))],
group = "HsSkMu",
bioproject = "GTEX"))
}
# swap out gene stable ID for gene name and sum each gene across samples
if (species == "Dr") {
count_data <-
inner_join(count_data,
read_tsv("/home/gdkendalllab/lab/references/annotations/zebrafish_gene_info.txt.gz",
show_col_types = FALSE,
col_select = c("Gene_stable_ID_version",
"Gene_name")) %>%
mutate(gene = Gene_stable_ID_version) %>%
select(-Gene_stable_ID_version) %>%
distinct() %>%
na.omit()) %>%
mutate(gene = Gene_name) %>%
select(-Gene_name) %>%
group_by(gene) %>%
summarize(across(where(is.numeric), sum), .groups = "drop")
}
group_1_samples <- metadata$sample[metadata$group %in%
de_cmp_table$group_1[i]]
group_2_samples <- metadata$sample[metadata$group %in%
de_cmp_table$group_2[i]]
count_data <- count_data[, match(c("gene",
group_1_samples,
group_2_samples),
colnames(count_data))] %>%
column_to_rownames("gene")
groups <- c(rep(de_cmp_table$group_1[i], length(group_1_samples)),
rep(de_cmp_table$group_2[i], length(group_2_samples)))
saveRDS(groups, file = paste0("output/de/",
de_cmp_table$group_1[i],
"_",
de_cmp_table$group_2[i],
"_groups.rds"))
data_dge <- DGEList(counts = count_data, group = groups)
kept_df <- filterByExpr(data_dge, min.count = 0.1)
data_dge <- data_dge[kept_df, keep.lib.sizes = FALSE] %>%
calcNormFactors(.) %>%
estimateDisp(., model.matrix(~ groups))
dim(data_dge$counts)
tested <- exactTest(data_dge)
results <- topTags(tested, n = Inf)
results$table$Gene <- rownames(results$table)
summarized_data <- cpm(data_dge) %>%
as.data.frame() %>%
rownames_to_column(var = "Gene") %>%
as_tibble() %>%
full_join(cpmByGroup(data_dge) %>%
as.data.frame() %>%
rename_all(~ str_c(.x, "_mean")) %>%
rownames_to_column("Gene") %>%
as_tibble(),
by = "Gene") %>%
full_join(results$table, by = "Gene") %>%
rowwise() %>%
mutate(signif = FDR <= 0.1)
# add in zebrafish gene annotations
if (species == "Dr") {
summarized_data <-
inner_join(summarized_data,
read_tsv("/home/gdkendalllab/lab/references/annotations/zebrafish_gene_info.txt.gz",
show_col_types = FALSE,
col_select = c("Gene_stable_ID_version",
"Gene_name")) %>%
mutate(Gene = Gene_name) %>%
distinct()) %>%
mutate(Gene = Gene_name) %>%
select(-Gene_name, -Gene_stable_ID_version) %>%
group_by(Gene) %>%
distinct()
}
write_tsv(summarized_data, paste0("output/de/",
de_cmp_table$group_1[i],
"_",
de_cmp_table$group_2[i],
"_edgeR.txt"))
## Plots
### LogFC
logFC_plot_list[[i]] <-
ggplot(results$table, aes(x = logFC)) +
geom_histogram(bins = 300) +
ggtitle(paste0(de_cmp_table$group_1[i],
" vs ",
de_cmp_table$group_2[i],
" logFC Histogram"))
names(logFC_plot_list)[i] <- paste(de_cmp_table$group_1[i],
"vs",
de_cmp_table$group_2[i])
ggsave(paste0("output/figures/logFC_histogram_",
de_cmp_table$group_1[i],
"_",
de_cmp_table$group_2[i],
".png"),
logFC_plot_list[[i]],
width = 6,
height = 6)
x_mean_colname <- paste0(de_cmp_table$group_1[i], "_mean")
y_mean_colname <- paste0(de_cmp_table$group_2[i], "_mean")
x_max <- (max(summarized_data[[x_mean_colname]]) / 1000) %>%
ceiling() * 1000
y_max <- (max(summarized_data[[y_mean_colname]]) / 1000) %>%
ceiling() * 1000
### Scatterplot
# Using "local" here because the variable x_mean_colname in the plot list
# uses only the final value outside of the loop
scatterplot_list[[i]] <-
local({
x_mean_colname <- x_mean_colname
y_mean_colname <- y_mean_colname
x_max <- x_max
y_max <- y_max
ggplot(summarized_data,
aes(x = get(x_mean_colname),
y = get(y_mean_colname))) +
geom_line(data = tibble({{ x_mean_colname }} := c(0, x_max),
{{ y_mean_colname }} := c(0, y_max)),
color = "red") +
geom_point(alpha = 0.5, size = 0.5) +
scale_x_log10() +
scale_y_log10() +
labs(x = paste0(de_cmp_table$group_1[i], " mean"),
y = paste0(de_cmp_table$group_2[i], " mean"),
title = paste0(de_cmp_table$group_1[i],
" vs ",
de_cmp_table$group_2[i],
" Scatterplot"))
})
names(scatterplot_list)[i] <- paste(de_cmp_table$group_1[i],
"vs",
de_cmp_table$group_2[i])
ggsave(paste0("output/figures/scatterplot_",
de_cmp_table$group_1[i],
"_",
de_cmp_table$group_2[i],
".png"),
scatterplot_list[[i]],
width = 6,
height = 6)
### Volcano
volcano_plot_list[[i]] <-
ggplot(summarized_data, aes(x = logFC,
y = -log10(PValue),
color = signif)) +
geom_point(alpha = 0.1) +
scale_color_brewer(palette = "Set2", name = "Signif.") +
labs(y = "-log10(p-value)",
title = paste0(de_cmp_table$group_1[i],
" vs ",
de_cmp_table$group_2[i],
" Volcano Plot"))
names(volcano_plot_list)[[i]] <- paste(de_cmp_table$group_1[i],
"vs",
de_cmp_table$group_2[i])
ggsave(paste0("output/figures/volcano_plot_",
de_cmp_table$group_1[i],
"_",
de_cmp_table$group_2[i],
".png"),
volcano_plot_list[[i]],
width = 10,
height = 6)
}
## Joining, by = "gene"
## Joining, by = "Gene"
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Joining, by = "gene"
## Joining, by = "Gene"
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Transformation introduced infinite values in continuous y-axis
orthologs <- read_tsv("../refs/ORTHOLOGY-ALLIANCE_COMBINED.tsv.gz",
show_col_types = FALSE,
comment = "#") %>%
filter(Gene2SpeciesName == "Danio rerio" &
Gene1SpeciesName == "Mus musculus") %>%
rename(mouse_gene = Gene1Symbol,
zebrafish_gene = Gene2Symbol) %>%
select(mouse_gene, zebrafish_gene) %>%
distinct()
# In the zfin ortholog, arf6b has no mouse ortholog, but from other sources, it
# is a homolog of Arf6, so I'm going to manually edit that here
# Also, ensemble has Pfn2 as the mouse ortholog of pnf2l, so I'll also include that
orthologs <- bind_rows(tibble(mouse_gene = c("Arf6",
"Pfn2"),
zebrafish_gene = c("arf6b",
"pfn2l")),
orthologs)
# This will duplicate rows where there are more than one human ortholog per zebrafish gene
fish_genes <- read_tsv("VGLL2NCOA2_27genelist.txt",
show_col_types = FALSE) %>%
rename(zebrafish_gene = gene) %>%
left_join(orthologs)
summary(fish_genes$mouse_gene %in% summarized_data$Gene)
combined_data <- left_join(fish_genes,
summarized_data %>%
rename(mouse_gene = Gene,
mouse_logFC = logFC) %>%
select(mouse_gene,
mouse_logFC,
signif,
Xenograft_C2C12_VGLL2_NCOA2_mean,
Xenograft_C2C12_pcDNA_mean)) %>%
select(zebrafish_gene,
mouse_gene,
mouse_logFC,
signif,
mean_of_tumor_fpkm,
mean_of_development,
mean_of_muscle,
Xenograft_C2C12_VGLL2_NCOA2_mean,
Xenograft_C2C12_pcDNA_mean) %>%
mutate(zebrafish_logFC = log2(mean_of_tumor_fpkm / mean_of_muscle))
combined_data %>%
na.omit() %>%
select(mouse_logFC,
zebrafish_logFC,
signif,
zebrafish_gene,
mouse_gene) %>%
ggplot(aes(x = mouse_logFC,
y = zebrafish_logFC,
color = signif)) +
geom_point() +
ggrepel::geom_text_repel(aes(label = paste(zebrafish_gene,
mouse_gene,
sep = "/"))) +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
scale_color_brewer(palette = "Set2",
name = "Significant\nin mouse") +
labs(x = "Mouse logFC",
y = "Zebrafish logFC",
title = "Differential Expression in Zebrafish and Mouse Tumors")
ggsave(file = "output/figures/zebrafish_mouse_diff_expression.png",
width = 7,
height = 7)
png(filename = "output/figures/zebrafish_mouse_diff_expression_heatmap.png",
width = 2000,
height = 2500,
res = 300)
combined_data %>%
mutate(gene = paste(zebrafish_gene, mouse_gene)) %>%
select(gene,
mean_of_tumor_fpkm,
mean_of_muscle,
mean_of_development,
Xenograft_C2C12_VGLL2_NCOA2_mean,
Xenograft_C2C12_pcDNA_mean) %>%
rename_with(~ str_replace(.x, "mean_of_", "Zebrafish ") %>%
str_remove("_fpkm") %>%
str_remove("_mean") %>%
str_replace("Xeno", "Mm xeno") %>%
str_replace("pcDNA", "ctrl") %>%
str_replace_all("_", " ")) %>%
distinct() %>%
column_to_rownames("gene") %>%
as.matrix() %>%
pheatmap::pheatmap(scale = "none",
display_numbers = TRUE,
number_color = "#9c9c9c",
legend = FALSE,
angle_col = 45,
color = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7,
name = "YlOrRd")))(100))
dev.off()
SkM from GTEX
irdb <- read_tsv("iRDB_Arf6_RNAseq.txt", comment = "#") %>%
mutate(diagnosis_short = fct_reorder(diagnosis_short, FPKM)) %>%
select(diagnosis_short,
FPKM) %>%
na.omit() %>%
full_join(read_tsv("misc/iRDB_DEPMAP_key.txt")) %>%
left_join(read_csv("CRISPR_DepMap_22Q1_Public_Score_Chronos_subsetted_Arf6.csv") %>%
select(lineage_2, ARF6) %>%
distinct())
ggplot(irdb, aes(x = diagnosis_short, y = FPKM)) +
geom_boxplot() +
geom_jitter(alpha = 0.5) +
labs(x = "Diagnosis",
y = "FPKM",
title = "Tumor FPKM data from St. Jude iRDB")
ggplot(irdb, aes(x = ARF6, y = FPKM)) +
geom_point()
test <- read_csv("misc/ARF6 Expression 22Q1 Public.csv") %>%
rename(depmap_id = `Depmap ID`,
ARF6_expr = `Expression 22Q1 Public`,
cell_line_display_name = `Cell Line Name`) %>%
full_join(read_csv("CRISPR_DepMap_22Q1_Public_Score_Chronos_subsetted_Arf6.csv") %>%
select(depmap_id,
cell_line_display_name,
lineage_1,
lineage_2,
lineage_3,
ARF6)) %>%
rename(ARF6_gene_effect = ARF6)
ggplot(test, aes(x = ARF6_gene_effect,
y = ARF6_expr,
color = lineage_2 == "Rhabdomyosarcoma")) +
geom_point() +
geom_point(data = test %>%
filter(lineage_2 == "Rhabdomyosarcoma"),
aes(x = ARF6_gene_effect,
y = ARF6_expr),
color = "#fc8d62",
size = 3) +
ggrepel::geom_text_repel(data = test %>%
filter(lineage_2 == "Rhabdomyosarcoma"),
aes(x = ARF6_gene_effect,
y = ARF6_expr,
label = cell_line_display_name),
color = "black",
size = 2) +
labs(x = "ARF6 Gene Effect",
y = "Expression 22Q1 Public",
title = "Expression of ARF6 vs DepMap Effect") +
scale_color_brewer(palette = "Set2",
name = "RMS")
ggsave("output/figures/depmap_arf6_expression.png",
width = 6,
height = 5)
read_tsv("VGLL2NCOA2_27genelist.txt", show_col_types = FALSE) %>%
select(gene) %>%
left_join(read_tsv("/gpfs0/home1/gdkendalllab/lab/analyses/mvc002/refs/zebrafish_gene_info.txt.gz") %>%
select(`Gene name`, `Gene stable ID`) %>%
distinct() %>%
rename(gene = `Gene name`,
gene_stable_id = `Gene stable ID`)
/gpfs0/home1/gdkendalllab/lab/analyses/mvc002/refs/zebrafish_gene_info.txt.gz
/gpfs0/home1/gdkendalllab/lab/analyses/mvc002/refs/fasta/danRer11_gene_promoters.fasta.gz
get_data <- function(sample_name) {
filter_cmd <- paste0("cat output/fusionAligned/",
sample_name,
"Aligned.out.sam | grep NCOA | awk '$4 <= 341 && $8 >= 441 {print $_}'")
data_df <- system(filter_cmd, intern = TRUE) %>%
as.data.frame() %>%
separate(.,
col = `.`,
sep = "\t",
into = c(NA, NA, NA, "pos_1", "mapq", NA, "chr_2", "pos_2", NA, NA, NA, NA, NA, NA, NA)) %>%
filter(chr_2 == "=" &
mapq > 200) %>%
distinct()
if (nrow(data_df) > 0) {
plot_name <- data_df %>%
mutate(pos_1 = as.numeric(pos_1),
pos_2 = as.numeric(pos_2)) %>%
arrange(pos_1, pos_2) %>%
mutate(row_num = 1:n()) %>%
select(pos_1, pos_2, row_num) %>%
pivot_longer(cols = -row_num, names_to = "pos", values_to = "base") %>%
ggplot(aes(x = as.numeric(base), y = row_num, group = row_num)) +
geom_vline(xintercept = 391,
color = "red") +
geom_point() +
geom_line() +
ggtitle(sample_name) +
labs(x = "Base", y = "")
print(plot_name)
}
return(data_df)
}
samples <- c("A377-A378T1",
"A377-A378T2",
"A377-A378T3",
"A377-A378T4",
"A377-A378T5",
"A377-A378T6",
"A377-A378T7",
"A377-A378T8")
names(samples) <- samples
pdf("output/figures/fusionMappedPos.pdf")
fusion_mapped <- lapply(samples, function(x) get_data(x) %>% nrow())
dev.off()
fusion_mapped <- tibble(fusion_count = as.numeric(fusion_mapped),
sample_name = names(fusion_mapped))
get_reads_cmd <- "grep 'Uniquely mapped reads number' output/aligned/*Log.final.out"
fusion_data <- tibble(data_in = system(get_reads_cmd, intern = TRUE)) %>%
separate(data_in,
sep = "\t",
into = c("sample_name", "reads_mapped")) %>%
mutate(sample_name = sample_name %>%
str_remove("output/aligned/") %>%
str_remove("Log.final.out:.+"),
reads_mapped = as.numeric(reads_mapped),
M_reads_mapped = reads_mapped / 1000000) %>%
full_join(fusion_mapped,
by = "sample_name") %>%
mutate(fusions_per_M = fusion_count / M_reads_mapped) %>%
select(-M_reads_mapped)
fusion_data %>%
gt::gt() %>%
gt::fmt_number(columns = 2, decimals = 0, sep_mark = ",") %>%
gt::fmt_number(columns = 4, decimals = 2) %>%
gt::cols_label(sample_name = "Sample",
reads_mapped = "Reads Mapped",
fusion_count = "Fusions Detected",
fusions_per_M = md("Fusions per Million Reads")) %>%
gt::cols_width(fusions_per_M ~ px(125),
fusion_count ~ px(100)) %>%
gt::gtsave(file = "output/figures/fusionsMapped.png")
de_results <-
full_join(fusion_data, read_tsv("output/MmTumors_edgeR.txt", show_col_types = FALSE) %>%
column_to_rownames("Gene") %>%
filter(signif == "TRUE" & logFC > 0) %>%
select(starts_with("A377")) %>%
select(-`A377-A378T7`, -`A377-A378T8`) %>%
t() %>%
as.data.frame() %>%
rownames_to_column("sample_name"))
ggplot(de_results, aes(x = fusions_per_M, y = Tspan33)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point() +
ggtitle("Tspan33")
cor.test(de_results$fusions_per_M, de_results$Wisp1)
cor_testing <- tibble(p_val = lapply(colnames(de_results)[5:ncol(de_results)],
function(x) cor.test(de_results$fusions_per_M,
de_results[[x]])$p.value) %>%
as.numeric(),
pearson_cor = lapply(colnames(de_results)[5:ncol(de_results)],
function(x) cor.test(de_results$fusions_per_M,
de_results[[x]])$estimate) %>%
as.numeric(),
gene = colnames(de_results)[5:ncol(de_results)]) %>%
arrange(p_val) %>%
mutate(fdr = qvalue::qvalue(p_val)$qvalues)
names(test) <- colnames(de_results)[5:ncol(de_results)]
Keep this chunk last in the analysis since it borks my namespace >:(
Compare: - VGLL2NCOA2 human and fish tumors vs. mature skeletal muscle - VGLL2NCOA2 human and fish tumors vs. KRAS driven RMS - Mouse VGLL2NCOA2 and fish tumors vs. mature skeletal muscle# Need data frame with rownames equal to the column names for the gene expression data
# and columns showing the groups
library(Biobase)
library(AGDEX)
cmp_table <- tibble(group_1 = c("DrFusion_DrSkMu",
"DrFusion_DrSkMu",
"DrFusion_DrSkMu",
"HsFusion_HsSkMu",
"HsFusion_HsSkMu"),
group_2 = c("HsFusion_HsSkMu",
"MmFusion_MmSkMu",
"DrKras_DrKrasCtrl",
"DrKras_DrKrasCtrl",
"MmFusion_MmSkMu"))
species_full <- list(Hs = "Homo sapiens",
Dr = "Danio rerio",
Mm = "Mus musculus")
for (i in seq_len(nrow(cmp_table))) {
comparison_1 <- cmp_table$group_1[i]
comparison_2 <- cmp_table$group_2[i]
species_1 <- species_full[[substr(comparison_1, 1, 2)]]
species_2 <- species_full[[substr(comparison_2, 1, 2)]]
if (species_1 != species_2) {
# get homologs
orthologs <- read_tsv("../refs/ORTHOLOGY-ALLIANCE_COMBINED.tsv.gz",
show_col_types = FALSE,
comment = "#") %>%
dplyr::filter(Gene1SpeciesName == species_1 &
Gene2SpeciesName == species_2) %>%
dplyr::select(group_1_gene = Gene1Symbol,
group_2_gene = Gene2Symbol) %>%
distinct()
} else {
orthologs <- read_tsv("../refs/ORTHOLOGY-ALLIANCE_COMBINED.tsv.gz",
show_col_types = FALSE,
comment = "#") %>%
dplyr::filter(Gene1SpeciesName == species_1) %>%
dplyr::select(group_1_gene = Gene1Symbol,
group_2_gene = Gene1Symbol) %>%
distinct()
}
# Get normalized data
data_1 <- read_tsv(paste0("output/de/",
cmp_table$group_1[i],
"_edgeR.txt"),
show_col_types = FALSE) %>%
dplyr::select(-matches("_mean"),
-logFC,
-logCPM,
-PValue,
-FDR,
-signif) %>%
dplyr::filter(Gene %in% orthologs$group_1_gene) %>%
column_to_rownames("Gene")
data_2 <- read_tsv(paste0("output/de/",
cmp_table$group_2[i],
"_edgeR.txt"),
show_col_types = FALSE) %>%
dplyr::select(-matches("_mean"),
-logFC,
-logCPM,
-PValue,
-FDR,
-signif) %>%
dplyr::filter(Gene %in% orthologs$group_2_gene) %>%
column_to_rownames("Gene")
group_1_pheno = data.frame(row.names = colnames(data_1),
grp = readRDS(paste0("output/de/",
comparison_1,
"_groups.rds")))
group_2_pheno = data.frame(row.names = colnames(data_2),
grp = readRDS(paste0("output/de/",
comparison_2,
"_groups.rds")))
# Do de testing on each dataset
group_1_dex_set <-
make.dex.set.object(
Eset.data = ExpressionSet(assayData = as.matrix(data_1),
phenoData = AnnotatedDataFrame(group_1_pheno)),
comp.var = "grp",
comp.def = str_replace(comparison_1, "_", "-"),
gset.collection = NULL)
group_2_dex_set <-
make.dex.set.object(
Eset.data = ExpressionSet(assayData = as.matrix(data_2),
phenoData = AnnotatedDataFrame(group_2_pheno)),
comp.var = "grp",
comp.def = str_replace(comparison_2, "_", "-"),
gset.collection = NULL)
# Make up list of map.data
map_data <- list(probe.map = orthologs,
map.Aprobe.col = 1,
map.Bprobe.col = 2)
# Run agdex
agdex_res <- AGDEX::agdex(dex.setA = group_1_dex_set,
dex.setB = group_2_dex_set,
map.data = map_data,
min.nperms = 5000,
max.nperms = 10000)
write_tsv(agdex_res$gwide.agdex.result,
file = paste0("output/agdex/",
comparison_1,
"_vs_",
comparison_2,
".txt"))
# agdex_res$meta.dex.res %>%
# mutate(signif = paste(dpvalA <= 0.05, dpvalB <= 0.05),
# direction = paste(dstatA > 0, dstatB > 0)) %>%
# dplyr::filter(signif != "FALSE FALSE") %>%
# ggplot(aes(x = dstatA, y = dstatB, color = direction)) +
# geom_point(alpha = 0.3, size = 1) +
# #scale_x_log10() +
# #scale_y_log10() +
# facet_wrap(~direction) +
# xlim(-1000, 1000) +
# ylim(-1000, 1000) +
# labs(x = comparison_1,
# y = comparison_2)
png(paste0("output/agdex/",
comparison_1,
"_vs_",
comparison_2,
".png"),
width = 3000,
height = 3000,
res = 300)
agdex.scatterplot(agdex_res)
dev.off()
merged_results <-
read_tsv(paste0("output/de/",
cmp_table$group_1[i],
"_edgeR.txt"),
show_col_types = FALSE) %>%
dplyr::select(Gene,
logFC,
signif) %>%
dplyr::filter(Gene %in% orthologs$group_1_gene) %>%
dplyr::rename(group_1_gene = Gene,
g1_logfc = logFC,
g1_signif = signif) %>%
left_join(orthologs, by = "group_1_gene") %>%
inner_join(read_tsv(paste0("output/de/",
cmp_table$group_2[i],
"_edgeR.txt"),
show_col_types = FALSE) %>%
dplyr::select(Gene,
matches("_mean"),
logFC,
signif) %>%
dplyr::filter(Gene %in% orthologs$group_2_gene) %>%
dplyr::rename(group_2_gene = Gene,
g2_logfc = logFC,
g2_signif = signif) %>%
left_join(orthologs)) %>%
mutate(signif = paste(g1_signif, g2_signif),
direction = if_else(g1_logfc > 0 & g2_logfc > 0,
"up",
if_else(g1_logfc < 0 & g2_logfc < 0,
"down",
"discordant"))) %>%
group_by(direction) %>%
mutate(direction = str_c(direction, " (", n(), ")"))
lm_test <- lm(g2_logfc ~ g1_logfc, data = merged_results) %>%
summary()
cor_test <- cor.test(merged_results$g1_logfc,
merged_results$g2_logfc,
method = "pearson")
# Calculate the proportion of genes are in the upper right and lower left of the plot
prop_q2_q3 <- sum((merged_results$g1_logfc > 0 &
merged_results$g2_logfc > 0) |
(merged_results$g1_logfc < 0 &
merged_results$g2_logfc < 0)) / nrow(merged_results)
# Permute over random samples from kernal density function
g1_density <- density(merged_results$g1_logfc, n = 2000)
g2_density <- density(merged_results$g2_logfc, n = 2000)
n_perm <- 100000
n_perm_higher <- 0
for (i in seq_len(n_perm)) {
g1_perm <- sample(g1_density$x,
nrow(merged_results),
prob = g1_density$y,
replace = TRUE) +
rnorm(nrow(merged_results), 0, g1_density$bw)
g2_perm <- sample(g2_density$x,
nrow(merged_results),
prob = g2_density$y,
replace = TRUE) +
rnorm(nrow(merged_results), 0, g1_density$bw)
perm_prop <- sum((g1_perm > 0 & g2_perm > 0) |
(g1_perm < 0 & g2_perm < 0)) /
nrow(merged_results)
if (perm_prop >= prop_q2_q3) {
n_perm_higher <- n_perm_higher + 1
}
if (i %% 1000 == 0) {
message("permutation: ", i)
}
}
perm_p <- n_perm_higher/ n_perm
if (perm_p == 0) {
perm_p <- paste0("<", 1/n_perm)
}
ggplot(merged_results, aes(x = g1_logfc,
y = g2_logfc,
color = direction)) +
geom_point(alpha = 0.4, size = 1) +
labs(x = comparison_1,
y = comparison_2,
title = paste0(comparison_1,
" vs ",
comparison_2,
"\nSlope:",
round(lm_test$coefficients[2], digits =2),
", R-squared:",
round(lm_test$r.squared, digits = 2),
"\nPearson: ",
round(cor_test$estimate, digits = 2),
", p-value: ",
sprintf("%.2e", cor_test$p.value),
"\nPerm p-value: ",
perm_p)) +
scale_color_manual(values = c("gray",
"#2D2DE5",
"#D32821"),
name = "Direction") +
geom_smooth(method = "lm",
color = "black",
se = FALSE)
ggsave(paste0("output/agdex/",
comparison_1,
"_vs_",
comparison_2,
"_raw_data.png"),
height = 8,
width = 10)
}
## Preparing differential expression data sets (dex.setA and dex.setB): Wed Jun 8 16:44:31 2022
## Preparing data that maps probe set IDs across experiments: Wed Jun 8 16:44:32 2022
## Computing statistics for observed data: Wed Jun 8 16:44:32 2022
## Permuting experiment A data 10000 times: Wed Jun 8 16:44:32 2022
## Computing time for permutation analysis of data set A (in seconds):
## 10.928 0.007 10.963 0 0
## Permuting experiment B data 10000 times: Wed Jun 8 16:44:43 2022
## Computing time for permutation analysis of data set B (in seconds):
## 153.61 0.055 154.2 0 0
## Done: Wed Jun 8 16:47:17 2022
## Joining, by = "group_2_gene"Joining, by = c("group_1_gene", "group_2_gene")permutation: 1000
## permutation: 2000
## permutation: 3000
## permutation: 4000
## permutation: 5000
## permutation: 6000
## permutation: 7000
## permutation: 8000
## permutation: 9000
## permutation: 10000
## permutation: 11000
## permutation: 12000
## permutation: 13000
## permutation: 14000
## permutation: 15000
## permutation: 16000
## permutation: 17000
## permutation: 18000
## permutation: 19000
## permutation: 20000
## permutation: 21000
## permutation: 22000
## permutation: 23000
## permutation: 24000
## permutation: 25000
## permutation: 26000
## permutation: 27000
## permutation: 28000
## permutation: 29000
## permutation: 30000
## permutation: 31000
## permutation: 32000
## permutation: 33000
## permutation: 34000
## permutation: 35000
## permutation: 36000
## permutation: 37000
## permutation: 38000
## permutation: 39000
## permutation: 40000
## permutation: 41000
## permutation: 42000
## permutation: 43000
## permutation: 44000
## permutation: 45000
## permutation: 46000
## permutation: 47000
## permutation: 48000
## permutation: 49000
## permutation: 50000
## permutation: 51000
## permutation: 52000
## permutation: 53000
## permutation: 54000
## permutation: 55000
## permutation: 56000
## permutation: 57000
## permutation: 58000
## permutation: 59000
## permutation: 60000
## permutation: 61000
## permutation: 62000
## permutation: 63000
## permutation: 64000
## permutation: 65000
## permutation: 66000
## permutation: 67000
## permutation: 68000
## permutation: 69000
## permutation: 70000
## permutation: 71000
## permutation: 72000
## permutation: 73000
## permutation: 74000
## permutation: 75000
## permutation: 76000
## permutation: 77000
## permutation: 78000
## permutation: 79000
## permutation: 80000
## permutation: 81000
## permutation: 82000
## permutation: 83000
## permutation: 84000
## permutation: 85000
## permutation: 86000
## permutation: 87000
## permutation: 88000
## permutation: 89000
## permutation: 90000
## permutation: 91000
## permutation: 92000
## permutation: 93000
## permutation: 94000
## permutation: 95000
## permutation: 96000
## permutation: 97000
## permutation: 98000
## permutation: 99000
## permutation: 100000
## `geom_smooth()` using formula 'y ~ x'
## Preparing differential expression data sets (dex.setA and dex.setB): Wed Jun 8 16:53:40 2022
## Preparing data that maps probe set IDs across experiments: Wed Jun 8 16:53:41 2022
## Computing statistics for observed data: Wed Jun 8 16:53:41 2022
## Permuting experiment A data 10000 times: Wed Jun 8 16:53:41 2022
## Computing time for permutation analysis of data set A (in seconds):
## 9.595 0.009 9.627 0 0
## Permuting experiment B data 10000 times: Wed Jun 8 16:53:50 2022
## Computing time for permutation analysis of data set B (in seconds):
## 9.613 0.001 9.639 0 0
## Done: Wed Jun 8 16:54:00 2022
## Joining, by = "group_2_gene"Joining, by = c("group_1_gene", "group_2_gene")permutation: 1000
## permutation: 2000
## permutation: 3000
## permutation: 4000
## permutation: 5000
## permutation: 6000
## permutation: 7000
## permutation: 8000
## permutation: 9000
## permutation: 10000
## permutation: 11000
## permutation: 12000
## permutation: 13000
## permutation: 14000
## permutation: 15000
## permutation: 16000
## permutation: 17000
## permutation: 18000
## permutation: 19000
## permutation: 20000
## permutation: 21000
## permutation: 22000
## permutation: 23000
## permutation: 24000
## permutation: 25000
## permutation: 26000
## permutation: 27000
## permutation: 28000
## permutation: 29000
## permutation: 30000
## permutation: 31000
## permutation: 32000
## permutation: 33000
## permutation: 34000
## permutation: 35000
## permutation: 36000
## permutation: 37000
## permutation: 38000
## permutation: 39000
## permutation: 40000
## permutation: 41000
## permutation: 42000
## permutation: 43000
## permutation: 44000
## permutation: 45000
## permutation: 46000
## permutation: 47000
## permutation: 48000
## permutation: 49000
## permutation: 50000
## permutation: 51000
## permutation: 52000
## permutation: 53000
## permutation: 54000
## permutation: 55000
## permutation: 56000
## permutation: 57000
## permutation: 58000
## permutation: 59000
## permutation: 60000
## permutation: 61000
## permutation: 62000
## permutation: 63000
## permutation: 64000
## permutation: 65000
## permutation: 66000
## permutation: 67000
## permutation: 68000
## permutation: 69000
## permutation: 70000
## permutation: 71000
## permutation: 72000
## permutation: 73000
## permutation: 74000
## permutation: 75000
## permutation: 76000
## permutation: 77000
## permutation: 78000
## permutation: 79000
## permutation: 80000
## permutation: 81000
## permutation: 82000
## permutation: 83000
## permutation: 84000
## permutation: 85000
## permutation: 86000
## permutation: 87000
## permutation: 88000
## permutation: 89000
## permutation: 90000
## permutation: 91000
## permutation: 92000
## permutation: 93000
## permutation: 94000
## permutation: 95000
## permutation: 96000
## permutation: 97000
## permutation: 98000
## permutation: 99000
## permutation: 100000
## `geom_smooth()` using formula 'y ~ x'
## Preparing differential expression data sets (dex.setA and dex.setB): Wed Jun 8 16:59:27 2022
## Preparing data that maps probe set IDs across experiments: Wed Jun 8 16:59:28 2022
## Computing statistics for observed data: Wed Jun 8 16:59:28 2022
## Permuting experiment A data 10000 times: Wed Jun 8 16:59:28 2022
## Computing time for permutation analysis of data set A (in seconds):
## 10.126 0.001 10.157 0 0
## Permuting experiment B data 20 times: Wed Jun 8 16:59:39 2022
## Computing time for permutation analysis of data set B (in seconds):
## 0.014 0 0.014 0 0
## Done: Wed Jun 8 16:59:39 2022
## Joining, by = "group_2_gene"Joining, by = c("group_1_gene", "group_2_gene")permutation: 1000
## permutation: 2000
## permutation: 3000
## permutation: 4000
## permutation: 5000
## permutation: 6000
## permutation: 7000
## permutation: 8000
## permutation: 9000
## permutation: 10000
## permutation: 11000
## permutation: 12000
## permutation: 13000
## permutation: 14000
## permutation: 15000
## permutation: 16000
## permutation: 17000
## permutation: 18000
## permutation: 19000
## permutation: 20000
## permutation: 21000
## permutation: 22000
## permutation: 23000
## permutation: 24000
## permutation: 25000
## permutation: 26000
## permutation: 27000
## permutation: 28000
## permutation: 29000
## permutation: 30000
## permutation: 31000
## permutation: 32000
## permutation: 33000
## permutation: 34000
## permutation: 35000
## permutation: 36000
## permutation: 37000
## permutation: 38000
## permutation: 39000
## permutation: 40000
## permutation: 41000
## permutation: 42000
## permutation: 43000
## permutation: 44000
## permutation: 45000
## permutation: 46000
## permutation: 47000
## permutation: 48000
## permutation: 49000
## permutation: 50000
## permutation: 51000
## permutation: 52000
## permutation: 53000
## permutation: 54000
## permutation: 55000
## permutation: 56000
## permutation: 57000
## permutation: 58000
## permutation: 59000
## permutation: 60000
## permutation: 61000
## permutation: 62000
## permutation: 63000
## permutation: 64000
## permutation: 65000
## permutation: 66000
## permutation: 67000
## permutation: 68000
## permutation: 69000
## permutation: 70000
## permutation: 71000
## permutation: 72000
## permutation: 73000
## permutation: 74000
## permutation: 75000
## permutation: 76000
## permutation: 77000
## permutation: 78000
## permutation: 79000
## permutation: 80000
## permutation: 81000
## permutation: 82000
## permutation: 83000
## permutation: 84000
## permutation: 85000
## permutation: 86000
## permutation: 87000
## permutation: 88000
## permutation: 89000
## permutation: 90000
## permutation: 91000
## permutation: 92000
## permutation: 93000
## permutation: 94000
## permutation: 95000
## permutation: 96000
## permutation: 97000
## permutation: 98000
## permutation: 99000
## permutation: 100000
## `geom_smooth()` using formula 'y ~ x'
## Preparing differential expression data sets (dex.setA and dex.setB): Wed Jun 8 17:05:19 2022
## Preparing data that maps probe set IDs across experiments: Wed Jun 8 17:05:20 2022
## Computing statistics for observed data: Wed Jun 8 17:05:20 2022
## Permuting experiment A data 10000 times: Wed Jun 8 17:05:21 2022
## Computing time for permutation analysis of data set A (in seconds):
## 152.716 0.053 153.221 0 0
## Permuting experiment B data 20 times: Wed Jun 8 17:07:54 2022
## Computing time for permutation analysis of data set B (in seconds):
## 0.014 0 0.014 0 0
## Done: Wed Jun 8 17:07:54 2022
## Joining, by = "group_2_gene"Joining, by = c("group_1_gene", "group_2_gene")permutation: 1000
## permutation: 2000
## permutation: 3000
## permutation: 4000
## permutation: 5000
## permutation: 6000
## permutation: 7000
## permutation: 8000
## permutation: 9000
## permutation: 10000
## permutation: 11000
## permutation: 12000
## permutation: 13000
## permutation: 14000
## permutation: 15000
## permutation: 16000
## permutation: 17000
## permutation: 18000
## permutation: 19000
## permutation: 20000
## permutation: 21000
## permutation: 22000
## permutation: 23000
## permutation: 24000
## permutation: 25000
## permutation: 26000
## permutation: 27000
## permutation: 28000
## permutation: 29000
## permutation: 30000
## permutation: 31000
## permutation: 32000
## permutation: 33000
## permutation: 34000
## permutation: 35000
## permutation: 36000
## permutation: 37000
## permutation: 38000
## permutation: 39000
## permutation: 40000
## permutation: 41000
## permutation: 42000
## permutation: 43000
## permutation: 44000
## permutation: 45000
## permutation: 46000
## permutation: 47000
## permutation: 48000
## permutation: 49000
## permutation: 50000
## permutation: 51000
## permutation: 52000
## permutation: 53000
## permutation: 54000
## permutation: 55000
## permutation: 56000
## permutation: 57000
## permutation: 58000
## permutation: 59000
## permutation: 60000
## permutation: 61000
## permutation: 62000
## permutation: 63000
## permutation: 64000
## permutation: 65000
## permutation: 66000
## permutation: 67000
## permutation: 68000
## permutation: 69000
## permutation: 70000
## permutation: 71000
## permutation: 72000
## permutation: 73000
## permutation: 74000
## permutation: 75000
## permutation: 76000
## permutation: 77000
## permutation: 78000
## permutation: 79000
## permutation: 80000
## permutation: 81000
## permutation: 82000
## permutation: 83000
## permutation: 84000
## permutation: 85000
## permutation: 86000
## permutation: 87000
## permutation: 88000
## permutation: 89000
## permutation: 90000
## permutation: 91000
## permutation: 92000
## permutation: 93000
## permutation: 94000
## permutation: 95000
## permutation: 96000
## permutation: 97000
## permutation: 98000
## permutation: 99000
## permutation: 100000
## `geom_smooth()` using formula 'y ~ x'
## Preparing differential expression data sets (dex.setA and dex.setB): Wed Jun 8 17:14:06 2022
## Preparing data that maps probe set IDs across experiments: Wed Jun 8 17:14:07 2022
## Computing statistics for observed data: Wed Jun 8 17:14:08 2022
## Permuting experiment A data 10000 times: Wed Jun 8 17:14:08 2022
## Computing time for permutation analysis of data set A (in seconds):
## 178.813 0.055 179.528 0 0
## Permuting experiment B data 10000 times: Wed Jun 8 17:17:07 2022
## Computing time for permutation analysis of data set B (in seconds):
## 10.76 0.009 10.801 0 0
## Done: Wed Jun 8 17:17:18 2022
## Joining, by = "group_2_gene"Joining, by = c("group_1_gene", "group_2_gene")permutation: 1000
## permutation: 2000
## permutation: 3000
## permutation: 4000
## permutation: 5000
## permutation: 6000
## permutation: 7000
## permutation: 8000
## permutation: 9000
## permutation: 10000
## permutation: 11000
## permutation: 12000
## permutation: 13000
## permutation: 14000
## permutation: 15000
## permutation: 16000
## permutation: 17000
## permutation: 18000
## permutation: 19000
## permutation: 20000
## permutation: 21000
## permutation: 22000
## permutation: 23000
## permutation: 24000
## permutation: 25000
## permutation: 26000
## permutation: 27000
## permutation: 28000
## permutation: 29000
## permutation: 30000
## permutation: 31000
## permutation: 32000
## permutation: 33000
## permutation: 34000
## permutation: 35000
## permutation: 36000
## permutation: 37000
## permutation: 38000
## permutation: 39000
## permutation: 40000
## permutation: 41000
## permutation: 42000
## permutation: 43000
## permutation: 44000
## permutation: 45000
## permutation: 46000
## permutation: 47000
## permutation: 48000
## permutation: 49000
## permutation: 50000
## permutation: 51000
## permutation: 52000
## permutation: 53000
## permutation: 54000
## permutation: 55000
## permutation: 56000
## permutation: 57000
## permutation: 58000
## permutation: 59000
## permutation: 60000
## permutation: 61000
## permutation: 62000
## permutation: 63000
## permutation: 64000
## permutation: 65000
## permutation: 66000
## permutation: 67000
## permutation: 68000
## permutation: 69000
## permutation: 70000
## permutation: 71000
## permutation: 72000
## permutation: 73000
## permutation: 74000
## permutation: 75000
## permutation: 76000
## permutation: 77000
## permutation: 78000
## permutation: 79000
## permutation: 80000
## permutation: 81000
## permutation: 82000
## permutation: 83000
## permutation: 84000
## permutation: 85000
## permutation: 86000
## permutation: 87000
## permutation: 88000
## permutation: 89000
## permutation: 90000
## permutation: 91000
## permutation: 92000
## permutation: 93000
## permutation: 94000
## permutation: 95000
## permutation: 96000
## permutation: 97000
## permutation: 98000
## permutation: 99000
## permutation: 100000
## `geom_smooth()` using formula 'y ~ x'
Styled with knitrBootstrap